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Abstract 



The 1-2. flattening lemma of Johnson and Lindenstrauss [ [JL84 | is a powerful tool for dimension 
reduction. It has been conjectured that the target dimension bounds can be refined and bounded 
in terms of the intrinsic dimensionality of the data set (for example, the doubling dimension). 
One such problem was proposed by Lang and Plaut [LP01| (see also [GKL03, Mat02, ABNQgfl ), 
and is still open. We prove another result in this line of work: 



The snowflake metric d 1 / 2 of a doubling set Sc^ can be embedded with arbitrarily 

rtion 
of the metric. 



low distortion into 1% , for dimension D that depends solely on the doubling constant 



In fact, the target dimension is poly logarithmic in the doubling constant. Our techniques 
are robust and extend to the more difficult spaces l\ and ioc, although the dimension bounds 
here are quantitatively inferior than those for £2- 
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1 Introduction 



Dimension reduction, in which high-dimensional data is faithfully represented in a low-dimensional 
space, is a key tool in several fields. Probably the most prevalent mathematical formulation of 
this problem considers the data to be a set S C £2, and the goal is to map the points in S into a 
low-dimensional l\. (Here and throughout, £p denotes the space R fc endowed with the £ p -norm; £ v is 
the infinite-dimensional counterpart of all sequences that are p-th power summable.) A celebrated 
result in this area is the so-called JL-Lemma: 



Theorem 1.1 (Johnson and Lindenstrauss [ JL84| ]). For every n-point subset S C £2 and every 



< e < 1, there is a mapping : S — > that preserves all interpoint distances in S within 
factor 1 + e, and has target dimension k = 0(e~ 2 logn). 

This positive result is remarkably strong; in fact the map is an easy to describe linear 
transformation. It has found many applications, and has become a basic tool. It is natural to seek 
the optimal (minimum) target dimension k possible in this theorem. The logarithmic dependence on 
n = \S\ is necessary, as can be easily seen by volume arguments, and Alon [ Alo03f| further proved 
that the JL-Lemma is optimal up to a factor of O(log-). These lower bounds are existential, 
meaning that there are sets S for which the result of the JL-Lemma cannot be improved. However, 
it may still be possible to significantly reduce the dimension for sets S that are "intrinsically" 
low-dimensional. This raises the interesting and fundamental question of bounding k in terms of 
parameters other than n, which we formalize next. 

We recall some basic terminology involving metric spaces. The doubling constant of a metric 
(M,cIm), denoted A(M), is the smallest A > 1 such that every (metric) ball in M can be covered 
by at most A balls of half its radius. We say that M is doubling if its doubling constant A(M) is 

bounded independently of \M\. It is sometimes more convenient to refer to dim(M) = f log 2 A(M), 
which is known as the doubling dimension of M |GKL03]. An embedding of one metric space 



(M, cZm) into another (N, djq) is a map \l/ : M — > N. We say that \& attains distortion D' > 1 if ^ 
preserves every pairwise distance within factor D' , namely, there is a scaling factor s > such that 

1 <d N m X )Mv)) <D , v M , 

s ■ d M {x,y) 



The following problem was posed independently by [LP01] and [ GKL03| ] (see also [Mat02, ABNO^ ]): 



Question 1. Does every doubling subset S C £2 embed with distortion D' into £{? for D,D' that 
depend only on X(S) ? 

This question is still open and seems very challenging. Resolving it in the affirmative seems 
to require completely different techniques than the JL-Lemma, since such an embedding cannot 



be achieved by a linear map [IN07, Remark 4.1]. For algorithmic applications, it would be ideal 
to resolve positively an even stronger variant of Question [l], where the target distortion D' is an 
absolute constant independent of X(S), or even 1 + e as in the JL-Lemma. This stronger version 
has not been excluded, and is still open as well. 

1.1 Results and Techniques 

We present dimension reduction results for doubling subsets in Euclidean spaces. In fact, we devise 
a robust framework that extends even to the spaces £\ and ^oo. Our results incur constant or 
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1 + e distortion, with target dimension that depends not on \S\ but rather on dim(S') (and this 
dependence is unavoidable due to volume arguments) . We remark that such guarantees - very low 
distortion and dimension - are highly sought-after in metric embeddings, but rarely achieved. We 
state our results in the context of finite metrics (subsets of £ p ); they extend to infinite subsets of 
L p via standard arguments. 



Snowflake Embedding. Our primary embedding achieves distortion 1 + e for the snowflake 
metric d a of an input metric d (i.e. the snowflake metric is obtained by raising every pairwise 
distance to power < a < 1). It is instructive to view a as a fixed constant, say a = 1/2. We 
prove the following in Section |3.3|. Throughout, we use O(f) to denote / • (log f)°^\ 



Theorem 1.2. Let < e < 1/4 and < a < 1. Every finite subset S C £2 admits an embedding 
$ : S -> l\ for k = 0(e" 4 (l - a)- 1 (dim5) 2 ), such that 



1 < 



\$(x) - 



n\\ a 
y\\2 



< 1 + e, 



Vx, y 6 S. 



Notice the difference between our theorem and Question |]: Our embedding achieves better 
distortion 1 + e, but it applies to the (often easier) snowflake metric d a . Our result is also related 



to the following theorem of Assouad [ Ass83|: For every doubling metric (M, d) and every < a < 1, 
the snowflake metric d a embeds into I® with distortion D' , where D, D' depend only on A(M) and 
a. Note the theorem's vast generality - the only requirements are the doubling property (which by 
volume arguments is an obvious necessity) and that the data be a metric - at the nontrivial price 
that the distortion achieved depends on A(M). Compared to Assouad's theorem, our embedding 
achieves a much stronger distortion 1 + e, but requires the additional assumption that the input 
metric is Euclidean. 

Previously, Theorem L2 was only known to hold in the special case where S = M (the real line). 
For this case, Kahane |Kah81] and Talagrand [Tal92] exhibit a 1 + e distortion embedding of the 
snowflake metric \x — y\ a into l\. Kahane's [ |Kah81 ] shows an embedding of \x — y| 1//2 (also known 



as Wilson's helix) into dimension k = 0(l/e), while Talagrand |Tal92f | shows how to embed every 
snowflake metric \x — y\ a , a E (0, 1), with dimension k = 0(K(a)/e 2 ) (which is larger). Thus, our 
theorem can be viewed as a generalization of [Kah81, Tal92f| to arbitrary doubling subset of £2 (or 



other £p), albeit with a somewhat worse dependence on e. 



Embedding for a Single Scale. Most of our technical work is devoted to designing an embed- 
ding that preserves distances at a single scale r > 0, while still maintaining a one-sided Lipschitz 
condition for all scales. We now state our most basic new result, which achieves only a constant 
distortion (for the desired scale). 

Theorem 1.3. For every scale r > and every < 6 < 1/4, every finite set S C £2 admits an 
embedding ip : S — > £\ for k = 0(log $ ■ (dimS 1 ) 2 ), satisfying: 

(a) . Lipschitz: \\p(x) — ip(y)\\2 < \\x — y\\ 2 for all x,y £ S ; 

(b) . Bi-Lipschitz at scale r: \\<p(x) — f(y)\\2 = Q(\\x — ?/ 1 1 2 ) whenever \\x — y\\2 G [5^,^]; and 

(c) . Boundedness: ||9?(x)|| 2 < r for all x G S. 
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The constant factor accuracy achieved by this theorem is too weak to achieve the 1 + e distortion 



asserted in Theorem L2. While we cannot improve condition [b] to a factor of 1 + e, we are able 
to refine it in a useful way. Roughly speaking, we introduce a "correction" function G : R — > R, 
such that whenever \\x — y\\2 G [Sr, r], 

M t-vh h={1±£) di} ^ ] - (1) 

This function G does not depend on r and equals 0(1) in the appropriate range. Using the 
correction function, we obtain very accurate bounds on distances in the target space, at the price 
of increasing the dimension by a factor of (9(l/e 3 ). This high-level idea is implemented in Theorem 



3.1, which immediately implies Theorem |1.3| , although the precise bound therein slightly differs 
from Equation (|I]). 

Embeddings for a single scale are commonly used in the embeddings literature, though not in the 
context of dimension reduction. It is plausible that in some applications, a single-scale embedding 
may suffice, or even provide better bounds than our snowflake embedding (or Question [l]). 

Other i p Spaces. Our dimension reduction framework extends to £ p (i.e. S C £ p and $ : 
S — > £p) for both p = 1 and p = oo, as discussed in Section ||. The bounds we obtain therein 
are worse than in the £2 case, namely the dimension k is at least exponential in dim(S'), which 
is to be expected because of strong lower bounds known in terms of n = \S\ (see |BC05| , LN04] 



for £\ and [Mat96| for £ QO ). We remark that previous work on dimension reduction in £ v spaces 



(JL8J, |5ch87| , |Tal90| , |Bal90| , |Tal95| , |Mat96|1 did not establish any dimension bound in terms of A(5); 



these bounds are all expressed in terms of n = \S\, or of the dimension of S as a linear subspace. 

For ultrametrics, our framework provides even stronger bounds, which resolve Question |l| in 
the affirmative, as follows. Ultrametrics embed isometrically (i.e. with distortion 1) into £2, hence 



Theorem L2 is immediately applicable. We can then eliminate the snowflake operator (i.e. achieve 



a = 1) by the observation that (M, d) is an ultrametric if and only if (M, d 2 ) is an ultrametric, and 



thus Theorem L2 is applicable to the ultrametric d 2 with a = 1/2. Moreover, the dimension bound 
can be improved by replacing some steps with more specialized machinery. However, in retrospect 
a near-optimal bound may be obtained by minor refinements of | ABN09| , Lemma 12]. 



Technical contribution. The main technical challenge is to keep both distortion and dimension 
under tight control. We use a relatively large number of the tools developed recently in the metric 
embeddings literature, combining them in a technically non-trivial manner that yields a rather 
strong outcome (1 + e distortion). Several of the tools we use are nonlinear, hence our approach 



can potentially be used to circumvent the limitation on linear embeddings observed by [EN07| 



Our results may also be viewed as partial progress towards Question |l]: Observe that Theorem 
1.2 answers that question positively in the special case where also the square of the given Euclidean 
metric is known to be Euclidean (e.g. for all ultrametrics). Further, Theorem |1.3| achieves bounds 
that relax those required by Question ||. Moreover, if the answer to Question [l] is negative (which 
is not unlikely), then our results may be essentially the closest alternative. 

1.2 Related work 

A summary of some related work on embeddings, meant to put our results in context, is found in 
Table [j]. Very recently, and subsequent to the public posting of this paper, the authors of [BRS07] 
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Reference 


Origin 


Tar 


|JL84| 




Li 


|Ass83] 


doubling 


(2 


GKL03fl 


doubling 


(2 


|HM06|j 


doubling 


p 

""OO 


|ABNO§H 


doubling 


Lp-I I 


Theorem 1.2 


h 





Target space Distortion Dimension 



Snowflake a 



0(e 2 \ogn) 
2 0(dim S) 

O(dimS) 

£ -0(dim5) 

O^-MimS) 
0(e- 3 dim 2 S) 



1 + e 

2 0(dimS) 

O(dimS) 
1 + e 
0(log 1+£ 
1 + e 



n) 



none (a = 1) 
fixed a < 1 
fixed a < 1 
fixed a < 1 
none 

fixed a < 1 



Table 1: A sampling of related work; holds for arbitrary e G (0, 1) 



concluded that an extension of their work, coupled with the framework presented here, yields a 
stronger version of Theorem |1.2| , where the target dimension is improved to 0(e~ 3 dim S"). This 
additional result has been appended to the most recent version of [BRS07]. 



1.3 Applications 

In many settings, data is provided as points in £p, and it is extremely advantageous to represent 
the data using a low-dimensional space. For instance, the cost of many data processing tasks (in 
terms of runtime, storage or accuracy) grows exponentially in the embedding dimension. In many 
such cases, our machinery can reduce the embedding dimension to close to the data's doubling 
dimension, leading to significant performance improvement. This approach suitable for problems 
(i) that depend on pairwise distances but can tolerate small distortion; and (ii) whose algorithms 
depend heavily on the embedding dimension, so that the improved performance given by the lower 
dimension outweighs the overhead cost of computing the dimensionality reduction. 

We provide in Section [| two examples where our dimensionality reduction results have imme- 
diate algorithmic applications. The first one is an approximate Distance Labeling Scheme, where 
the main complexity measure is the storage required at each network node. The second example 
is approximation algorithms for clustering algorithms, where running time is typically exponential 
in the dimension. In both cases, the final approximation obtained is 1 + e. 

On a more conceptual level, our embeddings may explain a common empirical phenomenon 
regarding low-dimensional data: Many heuristics that represent (non-Euclidean) input data as 
points in Euclidean space find that low-dimensional Euclidean space is sufficient to yield a fair 
representation, see e.g. |NZ02 ] for networking and [ TdSLOOj , |RS00| for machine learning. Our 



results can be interpreted as conveying the following principle: Intrinsically low-dimensional data 
that admits a meaningful representation in £2, can actually be represented in low- dimensional £2- 



Implementation. All our embedding results are algorithmic — they are constructive and can be 
computed in polynomial time. The details are mostly straightforward, and we do not address this 
issue explicitly. It is possible that the running time may be improved further, and perhaps even be 
brought close to linear. (For example, the Gaussian transform is computed quickly via the Gram 
matrix.) Two nontrivial steps in this direction are the implementation of Kirszbraun's Theorem, 
which is usually solved as a semidefinite program, and our use of padded partitions, which require 
an application of the Lovasz Local Lemma. 
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2 Preliminaries and tools 



Doubling dimension. For a metric (A, d), let A be the infimum value such that every ball in X can 
be covered by A balls of half the radius. The doubling dimension of X is dim(A) = log 2 A. A metric 
is doubling when its doubling dimension is finite. The following property can be demonstrated via 
a repetitive application of the doubling property. 

Property 2.1. For set S with doubling dimension log A, if the minimum interpoint distance in S 
is at least a, and the diameter of S is at most ft, then \S\ < X°^°s(P/ a )) _ 

e-nets. For a point set S, an e-net of S is a subset T C S with the following properties: (i) Packing: 
For every pair u,v G T, d(u, v) > e. (ii) Covering: Every point u G S is strictly within distance e 
of some point v £ T: d(y, x) < e. 



Lipschitz norm. Let (X,dx) and (Y,dy) be metric spaces. A function / : X — > Y is said to 
be K-Lipschitz (for K > 0) if for all x,x' G X we have dy(f(x),f(x')) < K ■ dx(x,x'). The 
Lipschitz constant (or Lipschitz norm) of /, denoted ||/||Lip> is the infimum over K > satisfying 
the above. A 1-Lipschitz function is called in short Lipschitz. We recall the following basic property 
of Lipschitz functions: Let f : X and g : X — > R. Then their product fg:x—t g(x) ■ f(x) 

has Lipschitz norm 

H/sllLip < WfWhip - maxima;)! + ||g|| Li p ■ max 



Extension Theorem. The Kirszbraun Theorem |Kir34| states that if S and X are Euclidean 
spaces, T C S, and there exists a Lipschitz function / : T — > X; then there exists a function 
/ : S — > X that has the same Lipschitz constant as / and also extends f, i.e. J\t = f, meaning 
that the restriction of / to T is identical to /. 

Bounded distances and the Gaussian Transform. A metric transform maps a distance 
function to another distance function on the same set of points (e.g. maps (A, d) to (X, d 1 / 2 )). We 
say that a metric transform is bounded (by T > 0) if it always results with a distance function where 
all interpoint distances are bounded (by T > 0). The Gaussian transform is a metric transform that 



maps value t to G r {t) = r(l — e * l r ) 1 / 2 , where r > is a parameter. Schoenberg |Sch38, DL97] 



showed that the Gaussian transform maps Euclidean spaces to Euclidean spaces. That is, for 
every r > and X C Li there is an embedding g : X — > L2 such that for all x, y G X we have 
\\g(x) — g(y)\\2 = GV(||x — 2/ 1 1 2 ) - It is easily verified that 

G r (t)<t, Vt>0, (2) 
thus ||<?||Lip < 1- Li addition, G r (t) < r for all t, hence the Gaussian transform is bounded. The 



idea of using bounded transforms for embeddings is due to [BRS07] 



Probabilistic partitions. Probabilistic partitions are a common tool used in embeddings. Let 
(A, d) be a finite metric space. A partition P of X is a collection of non-empty pairwise disjoint 
clusters P = {C\, C2, ■ ■ ■ , Ct} such that X = UjCj. For x G X we denote by P(x) the cluster 
containing x. 

We will need the following decomposition lemma due to Gupta, Krauthgamer and Lee |GKL03| 
and Abraham, Bartal and Neiman | |ABN08 |. Let B(x,r) = {y\ \\x — y\\ < r}. 
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Theorem 2.1 (Padded Decomposition of doubling metrics |GKL03 , ABN08j| ). There exists a 
constant cq > 1, such that for every metric space (X,d) and every A > 0, there is a multi-set 
T> = [Pi, . . . , P m ] of partitions of X , with m < cqE~ 1 dim(X) log dim(X), such that 

1. Bounded radius: diam(C) < A for all clusters C € UZLi P- 

2. Padding: If P is chosen uniformly from T>, then for all x E X, 

A r) C P(x)} > 1 - e. 



Pr [B(x, 



co dim(X) ■ 



Remark: [GKL03| provided slightly different quantitative bounds than in Theorem 2.1. The 
two enumerated properties follow from Lemma 2.7 in | ABN08f| , and the bound on support-size 
follows by an application of the Lovasz Local Lemma sketched therein. 



in 



3 Dimension Reduction for £2 

In this section we first design a single scale embedding that achieves distortion 1+e after including a 



correction function. This result is stated in Theorem 3.1 below, which is a refined version of Theorem 



1.3| . We then use this single scale embedding to prove Theorem |1.2| in Section |3.3| . Throughout 
this section, the norm notation || • || denotes ^-norms. We make no attempt to optimize constants. 
Following Section § define G : R R by G{t) = (1 - e"* 2 ) 1 / 2 , and let 

G r {t) = r ■ G(t/r) = r(l - e -* 2 A 2 )i/2_ 

Theorem 3.1. For every scale r > and every < 5, e < l/4 ; every finite set S C £2 admits an 
embedding ip : S —> t\ for k = 0(e~ 3 log ^ ■ (dimS) 2 ), satisfying: 

(a) . Lipschitz: \\ip(x) — ip(y)\\ < \\x — y\\ for all x,y £ S . 

(b) . 1 + e distortion to the Gaussian (at scales near r): For all x,y £ S with 5r < \\x — y\\ < |, 

1 ^ \\<p(x)-<p(y)\\ <v 



l + e ~ G r {\\x-y\ 

(c). Boundedness: \\(p(x)\\ < r for all x G S. 

In the sequel, we shall prove bounds that are slightly weaker than those stated above but only 
by a constant C > 1, e.g. ||^||Lip < 1 + Ce. The actual theorem follows immediately from these 
bounds by scaling of ip by 1+ p £ and scaling of e by 1/C. 

3.1 Embedding for a single scale 



Our construction of the embedding 93 for Theorem 3.1 proceeds in seven steps, as described below. 
Let A = \{S). All the hidden constants are absolute, i.e. independent of A, e, 5 and r. It is 
plausible that the dependence of target dimension on log A can be improved to be neardinear, by 
carefully combining some of these steps. 

Step 1 (Net Extraction): Let N C S be an (e<5r)-net in S. 
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Step 2 (Padded Decomposition): Compute for N a padded decomposition with padding 3 



<5 • 

More specifically, by Theorem 2A, there is a multiset [Pi, . . . , P m ] of partitions of N, where 
every point is ^-padded in 1 — e fraction of the partitions, all clusters have diameter bounded 
by A = 0(J log A), and m = 0(e~ 1 log A log log A) . 

Step 3 (Bounding Distances): In each partition Pi and each cluster C £ Pi, bound the inter- 
point distances in C at maximum value r. Specifically, using a Gaussian transform as per 
Section E3, obtain a map gc '■ C — > (.2 such that 

\\9cix) - g c {y)\\l = G r (\\x - y\\) 2 = r 2 (l - e-MiA- 2 ), Vx, y G C. 

Step 4 (Dimension Reduction): For each partition Pi and each cluster C G Pi, the point set 
gc{C) G £2 admits a dimension reduction, with distortion 1+e. Specifically, by the JL-Lemma 
there is a map ^JL : 9c{C) ->■ £2 such that 



\\t-t'\ 



< ||*jL(t)-*jL(Oll < \\t-t'\l Vt,t'eg c (C), (3) 



and the target dimension is (using Property 2.1) 



k' = 0(e- 2 log|C|) = 0(e- 2 log(A ( log ( A / e<5r »)) = 0( e - 2 log^ • log A log log A). 

Composing the last two steps, define fc = ^jl 9c mapping C — >■ l\ . 

Step 5 (Gluing Clusters): For each partition P{ , "glue" the cluster embeddings fc by smoothing 
them near the boundary. Specifically, for each cluster C G Pi, assume by translation that fc 
attains the origin, i.e. there exists zc G C such that H/cC^cOH = 0. Define he ■ C —> M. by 
hc(x) = min yg 7v\c ll x — 2/11; as a proxy for x's distance to the boundary of its cluster. Now 
define <pi : N — )• i% by 

<Pi(x) = fPi{ x ){x) ■ min{l, *h Pi ( x )(x)}; 

recall that Pi(x) is the unique cluster C G P, containing x. 

Step 6 (Gluing Partitions): Combine the maps obtained in the previous step via direct sum 
and scaling. Specifically, define ip : N — > by ip = m -1 / 2 (pi- 

Step 7 (Extension beyond the Net): Use the Kirszbraun theorem to extend the map <p to all 
of S, without increasing the Lipschitz constant. 



3.2 Proof of Theorem 3.1 



Let us show the embedding ip constructed above indeed satisfies the conclusion of Theorem 3.1. By 
construction, the target dimension is mk' = 0(e~ 3 log ^ (log A log log A) 2 ). 

We first focus on points in the net N, and later extend the analysis to all points in S. Let us 
start with a few immediate observations. 

Lemma 3.2. For every x, y G N and every i G {1, . . . , m}, 

(i) - \\fp l{x) (x)\\<r. 

(ii) . If Pi{x) = Pi(y) = C then \\f c {x) - f c {y)\\ < G r {\\x - y\\) < \\x - y\\. 
(in). If Pi(x) / Pi(y) then h P .^{x) < \\x - y\\. 
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Proof of Lemma 3.L For assertion Q], recall that by the translation, every cluster, and in particular 



C = Pi{x), contains a point zc G N such that fc(zc) = 0. Thus, using Equation (|2j) we have 

\\fc(x) - fc(zc)\\ < \\gc(x) - gc(zc)\\ = G r (\\x - z c \\) < r. 
To prove the assertion |uj, use Equations (||) and (|3|) , to get 

\\fc(x) - fc(y)\\ < II^jlIIlip • \\gc(x) - g c (y)\\ < G r (\\x - y\\) < \\x - y\\. 

For assertion |(iii)| , since C = Pi(x) / P%(y) we have that y G N\C, and so hc{x) = min^gjy^ ||a; — 
z|| < 1 1 £c — y\\. □ 



Analysis for the net N. We now prove assertions (a)-(c) of Theorem [O] for (only) net points. 
(We shall need this later to complete the proof of the theorem.) To this end, fix x,y G N. 

|(a)| Lipschitz: If \\x — y\\ > ?, we use the boundedness condition and the fact that 5 < j to get 

\\<p(x) - tp(y)\\ < \\<p(x)\\ + \\<p(y)\\ < 2r < J < ||x - 

Assume now that ||x — y|| < j. Then by Step 6 

m 

y( x ) - V (y)\f = ± Y,\\^(x) - w(y)\\ 2 - (4) 

i=l 

To bound the righthand side, fix i G {!) ■ ■■ and consider separately the following three 
cases. 

Case 1: x is padded. The padding is hence x and y belong to the same cluster C = 
Pi(x) = Pi(y). Furthermore, hc{x) > ^, thus <pi(x) = fc{x); similarly hc{y) > 



<5 

hc{x) - \\x - y\\ > and thus ipi(y) = fc(y)- Using Lemma |3^(ii) 



||^i(x) - = \\fc(x) - fc{y)\\ < \\x - 2/M- 

Case 2: x is not padded and Pi(x) ^ Pi{y)- By Lemma 3.2|(iii 



\Wi{x)\\ < \\fp^ x) (x)\\ ■ ^h Pi{x) (x) < 5h Pi(x) {x) < 5\\x - y\\. 

Using a similar bound for fi(y), we obtain 

\\<Pi(x) - <fi(y)\\ < \\<Pi(x)\\ + \\tpi(y)\\ < 25\\x - y\\ < \\x - y\\. 

Case 3: x is not padded and x,y belong to the same cluster Pi(x) = Pi(y) = C. Restrict 
tfi to C and write it as the product of the two functions z •->■ fc(z) and he ■ z h-> 
min{l, -hc(z)}. It follows that 



" < 1 1 /o 1 1 Lip • max \hc(x)\ + ||/ic||Li P • max \\fc{x 



x — y\\ xec x&c 

It easy to verify that max ze c \hc{z)\ < 1 and ||/ic||Li P < j ■ H^clkip < Plugging 



m 



these estimates and bounds on fc obtained from Lemma |3.2| , we have 
\ipi(x) - ipi( 



<l-l + fi-r = l + «5. 

1^ — 2/11 



S 



Now combine these three cases by plugging into Equation (||). Since x is padded in at least 
1 — e fraction of the partitions Pi, and for the remaining partitions we can use the worst 
bound among the three cases, we get 

\\<p(x) - ip(y)\\ 2 < (1 - e)||x - y\\ 2 + e(l + 6)\\x - y\\ 2 = (1 + eS)\\x - y\\ 2 . 



(b) Distortion to the Gaussian: We assume henceforth a slightly extended range ^5r < \\x — y\\ < 
. Observe that Gr W is monotonically decreasing in t, hence 

G r (\\x-y\\) G r (2r/5) G r (8r) 8_ 

\\x-y\\ ~ (2r/<5) (2r/8) 3' 1 J 

We proceed by considering the exact same three cases as above. 

Case 1': x is padded. By the analogous case above, Pi(x) = Pi(y) = C and 

y t {x)-^{y)\\ = \\f c {x)-f c {y)\\. 

By (||) we have 1 — e < ||^|^_^^||| < 1, where, by construction, the denominator 
equals G r (||a; — y\\). Altogether, we get 



" Gr(||x-y||) " • 

Case 2': x is not padded and Pi(x) ^ Pi(y)- Combining the analogous case above and 
Equation (||), we have 

\\<Pi(x) - <pi(y)\\ < 25\\x - y\\ < 6G r (\\x - y\\). 

Case 3': x is not padded and x,y belong to the same cluster Pi(x) = Pi(y) = C. Refining 
the analysis in the analogous case above and using Equation (||), we have 

\\<Pi(x) - <pi{y)\\ = \\fc(x)hc(x) - fc(y)hc{y)\\ 

< \\fc(x)h c (x) - fc(x)~hc(y)\\ + \\fc(x)h c (y) - fc(y)hc(y)\\ 

< \\f c (x)\\ ■ \h c (x) - hc(y)\ + \\fc(x) - fc(y)\\ ■ \h c (y)\ 
<r- 5 -\\x-y\\+G r (\\x-y\\)-l 
<4G r (\\x-y\\). 

Again combine these three cases by plugging into Equation (||) and recalling that x is padded 
in at least 1 — e fraction of partitions; we thus get 

ft ^2 ^ \\<P( X ) ~ ^(y)ll 2 ^ n v , c or 1,0^ / r n 
(1-e) < Gr{llx _ yW <(l- £ ) + £ -36 = l + 35 g . (6) 

For later use, let us record that 

- <p(y)\\ < G r (\\x - y\\)(l + 35c) 1 / 2 < G P (||x - y||)(l + 18e) < %G r (\\x - y\\). (7) 
(c]. Boundedness: By the fact < hp-^(x) < 1 and Lemma 3.2(1] 



i=i i=i 

This completes the analysis for net points x, y £ N. 
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Analysis for entire S. We extend the previous analysis to all points in S. Fix x,y E S, and 
let x',y' 6 N be the net points closest to x and y, respectively. Recalling that N is an eSr-net, we 
have || z — x'\\, \\y — y'\\ < e5r. To prove the Lipschitz requirement, recall that Step 8 extends ip 
from the net N to the entire S using the Kirszbraun theorem, i.e. without increasing its Lipschitz 
norm, hence 

\\ip(x) - <p{y)\\ < ||x-y||. 

Using this Lipschitz condition and the triangle inequality, we immediately obtain the boundedness 
requirement: 

\\ip{x)\\ < y{x')\\ + |M| Lip ||x - x'\\ < (1 + eS)r. 

To prove the requirement of distortion to the Gaussian (which is slightly more involved) assume 
further that 8r < \\x — y\\ < r. By the triangle inequality, 



\x — y\\ — \\x' — y'\ 



< \\x — x'\\ + \\y — y'|| < 2e5r. (8) 



We conclude that (1 — 2e)Sr < \\x' — y'\\ < (g + 2e5)r, and hence x',y' G N possess the bound 
for distortion to the Gaussian. It also follows that 2e5r < 4(1 — 2e)e5r < 4e||z' — y'||. Using the 
Lipschitz condition on tp, and the above distortion to the Gaussian for net points (Equation (|?])), 
we similarly derive 

ll^-^ll-I^^O-^yOllI < \\x-x'\\ + \\y-y'\\ < 2e5r < te\\x'-y'\\ < 22eG r (\\x' -y'\\). (9) 
We shall need the following bound on the behavior of G r (t). 
Lemma 3.3. Let < rj < 1/3 and suppose < t' < (1 + rj)t. Then < 1 + 3rj. 

Proof of Lemma \3. 4 Observe that G r (t) is monotonically increasing (in t), and thus 



G r (t') G r ({l + ri)t) G((l + ri)t/r) 
G r (t) ~ G r (t) - G{t/r) 

Letting s = t/r, we have 

G({l + r])s) 2 _ _ Gjjl + ^s) 2 -G{s) 2 _ e~ s2 - e ~( l +^ 2s2 e ~* 2 (l - e^ 2 ) 
G(s) 2 ~ G(s) 2 ~ l-er* 2 ~ l-e~ s2 



(10) 



Recall that for all < z < 1 we have 1 — z < e z < 1 — z + z 2 /2 < 1 — z/2. Using this estimate, 
we now have three cases: 

• When s 2 < 1, the righthand side of (10) is at most ^t/§- < 677. 

^ 2 

• When 1 < s 2 < l/3r], the righthand side of ( |i~0| ) is at most e ^zyi — 6?7s 2 e _s < 6r//e, where 
the last inequality follows from the observation that z h-> ze~ z is monotonically decreasing 
for all z > 1. 



When s 2 > l/3rj, the righthand side of (|T^) is at most \z^h < 6 1 JiJ' l f < 6r//e, where the 
last inequality follows similarly to the previous case. 



10 



Altogether, we conclude that §^ < G(( gff 5) < ^1 + 6?? < 1 + 3??. □ 

We are now ready to complete the proof of distortion to the Gaussian (for the entire set S). 
Similar to the derivation of Equation (^), we derive \\x' — y'\\ < (1 + 2e)||x — y\\, and by Lemma |3.3| 
we get G>(||x' — y'\\) < (1 + 6e)G r (\\x — y\\). Together with Equation (||) and the upper bound for 
net points (Equation (|7|)), we obtain 

\\<p(x) - <p(y)\\ < Mx>) - tp(y')\\ + 22eG r (\\x> - y'\\) 

< (l + 18e + 22e)G r (||a/-|/'||) 

< (l + 40e)(l + 6e)G r (||x-y||). 



The other direction is analogous. By (||) we have \\x — y\\ < (1 + 4e)||:z; / — y'\\, and by Lemma 3J3 
we get G r (||x — 2/11) < (1 + V2e)G T {\\x' — y'\\). Together with (|9|) and the lower bound for net points 
(Equation (||)), we obtain 

\\<p(x) - <p(y)\\ > Mx') - <p{j/)\\ - 22eG r {\\x' - y'\\) 

> (i_ e _22e)G r (||x / -y / ||) 

> (l-23e)(l-12e)G P (||s-y||). 

3.3 Snowflake Embedding 



We now use Theorem |3.1| (the single scale embedding) to prove Theorem |1.2| (embedding for d a ). 
For simplicity, we will first prove the theorem for a = 1/2, and then extend the proof to arbitrary 
< a < 1. 

Fix a finite set S C £2 and < e < 1 /4. Assume without loss of generality that the minimum 
interpoint distance in S is 1. Define p = 6[log 1+e (j)] = log |), and the set / = {i E Z : e 5 < 
(1 + e)* < e -5 diam(S')}. For each i G J, let ipi : S — > l\ be the embedding that achieves the bounds 
of Theorem |1.3| for S and e with respect to parameters r = (1 + e)* and 6 = (1 + e)~ p ^ 2 = 0(e 3 ). 
Notice that each tp^ has target dimension k = 0(e -3 log 2 A). 



We shall now use the following technique due to Assouad [Ass83|. First, each (fi is scaled 
by 1/ 'y/r = (1 + e)~ % l 2 . They are then grouped in a round robin fashion into p groups, and the 
embeddings in each group are summed up. This yields p embeddings, each into l\, which are 
combined using a direct-sum, resulting in one map $ into . 

Formally, let i = p j denote that two integers i,j are equal modulo p. Define <!> : S — > l\ using 
the direct sum $ = © je [ p ] where each $j : S — > t\ is given by 



<Pi 



.(l + e) l/2 ' 

For M = M(e) > that will be defined later, our final embedding is Qj\[M. : S — > which 
has target dimension pk < 0(e~ 4 log 2 A), as required (for a = 1/2). It thus remains to prove the 
distortion bound. The key idea is that in each $j, most of the contribution to ||$j(x) — <&j(y)|| 
comes from a single ipi, and we can further estimate that contribution quite accurately: It behaves 
roughly like (1 + e) J / 2 ||x — y\\. We need the following lemma. 
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Lemma 3.4. Let $> : S — > l^ 1 be as above, let x,y £ S, and define Bi = ^'^^/^^ ■ Then for 



every interval A C I of size p (namely A = {a, a + 1, . . . , a +p — 1}), 

2 



< E + E B >> 

i£A i'eI\A: i'= p i 



E B r 

i'eI\A: i'= p i 



Proof. By construction, 

2 

$(s) - <%) 



ieb] 



- 2^\\ Z> (l + e)</2 



Fix i £ A and let us bound the term corresponding to i. The first required inequality now follows 
by separating (among all i' £ I with i! = p i) the term for i! = i from the rest, and applying the 
triangle inequality for vectors v±,...,v s £ namely, || Xlz^ll ^ Yl,i \\ v i\\- The second inequality 
follows similarly by separating the term for i! = i from the rest, and applying the following triangle 
inequality for vectors u,v±, . . . ,v s G i\, namely, ||u + ^z^zll — max{0, ||u|| — ||^z||}- D 



The proof of Theorem 1.2 proceeds by demonstrating that, for an appropriate choice of A 
(meaning p and a), the leading term in the above summation (Bi for i £ A) dominates the sum of 
the other terms of the summation (terms By for i' £ I \ A : i! = p i). Fix x,y £ S, and let i* £ I 
be such that (1 + e) % < \\x — y\\ < (1 + e)' 1 +1 . We wish to apply Lemma To this end, let 
A = {i* — p/2 + + p/2} and consider i £ A. Observe that 



5<{l + e)- p/2 < (l + ef-* < 



\x-y 



+ 

hence we can apply Theorem 3A (b)| to obtain 



' < (1 + £) i * +1 ~ i < (1 + e) p/2 < | 



5' 



< 



\\ipi(x)-tpi(y)\\ 



1+e — G 



(l+e)« 



\\x-V\\ 



< 1. 



Combining this with the monotonicity of G r and Lemma 
e < j, we further obtain 

(1 + e)*/ 2 " (1 + e)^ 2 ~^ l + £) • 
By Theorem |3.1|(a)| and |(c)| , for all i' £ I, 

\\<Pi>(x) - <Pi'(y)\\ < min{||a; - (1 
and thus 

\\ipi'(x) -tpi'(y)\\ v-^ \\<pi>(x) - (pi* 



(11) 

and noting that G r (i) > when 



(12) 



E 



i'eI\A: i'= p i 



(1 + ef/ 2 



< 



E 



v <i: v= v i 



{1 + ef I 2 



+ E 



I Pi' 0*0 - Pi' 



< E (i+^' /2 + E 7i 



i'>i: i'= p i 

"x - y\ 



(1 + ^/2 



i'<2: I = v i 



v>i: i 



1- t (1 + £) 1 '/ 2 ' 
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Recalling that a geometric series with ratio less than i sums to less than twice the largest term, 



i'£l\A: i'= p i 



\(fi>(x) - (p.j> 



<2{l + ef~^l 2 + 2\\x-y\\(l + e) 



-(i+p)/2 



< 2(1 + efl 2 [(1 + e)- p ' 2 + |(1 + e) 1 ^/ 2 
<e(l + e) i/2 . 

Observe that the last bound is at most 2e times (|l2j). Using this information and plugging (|ll 



into Lemma 3.4, we obtain 



\^)-Hy)\\ 2 >Y.\rw ■G{i + eyi\\x-y 

(l+< 

-p/2<i-i*<p/2 



> 



-2s 



1 + e) 4 G((l + e) 



> (l-2e) 2 (l + e) 2i ^ 2 J] ((l + £ ) 6 -G((l + £ )- 

6: -p/2<b<p/2 



and similarly, using also Lemma |3T 

Wx) - <%)|| 2 <£((! + 2e) • G (1+e)l (||x - y| 



< £ ((l + 2 £ )(l + 4 £ )(l + e ) i -G'((l + e f- 

-p/2<i-i*<p/2 

< (1 + 4e) 4 (l + e) 2 ** £ ((1 + e) b • G((l + 2 . 

6: -p/2<b<p/2 



Setting M = „ p /2<fo<p/2 ^(1 + e ) ' ^((1 + £ ) ) ) j which clearly depends only on e (and is in 
particular independent of x,y), we combine the last two estimates to obtain 

(1-2 E )' . ||$Qe) ~ 



< (l + 4e) 4 . 



Q4*r ~ M \\x-y\\ 2 
We conclude that the final embedding $>/y/~M achieves distortion 1 + 0(e) for a = 1/2. 

Arbitrary < a < 1. Turning to proving the theorem for arbitrary values of < a < 1, we repeat 
the previous construction and proof with p = R Si+ £ (^)] an d 5 = (1 + e)~P( 1_Q! ) = 0(e 3 ). As 
before, cpi : S — > i\ is the embedding that achieves the bounds of Theorem |l.3j for S, and $ : S — > if 
is defined by the direct sum <E> = ©j £ [ p ] where each <3?j : S — > i\ is given by 

*.= v ^ 

16/: i= p j y ' 

The final embedding is <S>/Vm : S (for the same M as above), which has target dimension 
pk < 0(e _4 (l — a) -1 log 2 A), as required. 
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We need to make only small changes to the preceding proof of distortion: In the statement 



and proof of Lemma 3A, the dividing terms (1 + e) l l 2 and (1 + e) 1 '^ 2 are replaced by (1 + e)^ 1 °^ 



and (1 + e) 1 '^ 1 a \ respectively. The same substitution is made to modify Equation [12] and ob- 
tain ^^^Ta-a)^ > f (1 + s) l ^ l ~ a \ and to the subsequent geometric series, from which we derive 

J2i'ei\A- i'= p i ^(i+^y'fi-a)^ — e (l + e)^ 1- ") • (Note that the geometric series still has a ratio of less 
than |, due to the increase in value of p.) No other changes to the proof are necessary, and this 



completes the proof of Theorem 1.2 



4 Extension to Other £ p Spaces 

We briefly explain how our results and techniques can be extended to other £ p spaces. For concrete- 
ness, we consider only two important spaces, namely i\ and £ 00 . A number of key tools used in our 
previous embeddings are specific to £2, for example the JL-Lemma, the Gaussian transform, and 
the Kirszbraun theorem, and we must therefore find suitable replacements for these tools. Note 
however that there is no Lipschitz extension theorem for £\. 



The primary result of this section is a variant of our snowflake embedding, Theorem 1.2, We 
note that the snowflake operator is necessary in this theorem, as for a = 1 (and either p = 1 or 
p = 00) Lee, Mendel and Naor |LMN05 , Theorem 1.3] have shown that the target dimension cannot 



be bounded as a function of X(S), independently of \S\. 

Theorem 4.1. Let < e < 1/4, < a < 1 and p = {l,oo}. Every finite subset S C £ p with 
X = X(S) admits an embedding $ : S £p satisfying 



\\ x - y\\p 

with k = (l-a)- 1 exp{A°( 1 °g( 1 / £ ) +lo s lo s A )} forp=l, and k = (l-a)- 1 X ^ 1 /^ 10 ^ A) f orp = oo. 



Recall that our (refined) single scale embedding for £2 (Theorem 3.1), coupled with an applica- 
tion of Assouad's technique, were sufficient to prove Theorem |L^. Similarly, single scale embeddings 
for £\ and ^00, coupled with a standard application of Assouad's technique, are sufficient to prove 



Theorem |4.1| . We present single scale embeddings for £\ and below, and Theorem |4.1| then 



follows easily. 



4.1 Single Scale Embedding for l\ 



We can extend Theorem |3.1| to £\ spaces as follows. For r > define L r : M. — > K, called the Laplace 
distance transform, by L r (t) = r(l — e~'/ r ). Observe that L r (t) = r ■ G(y / t/r) 2 . 

Theorem 4.2. For every scale r > and every < 5, e < 1/4, every finite set S C £\ admits an 
embedding tp : S £\ for k = exp {\0(iog(i/e5)+\og\o g X)y satis fyi ng: 

(a) . Lipschitz: \\ip(x) — (p(y)\\i < \\x — y\\\ for all x,y £ S. 

(b) . 1 + e distortion to the Laplace transform (at scales near r): For all x,y G S with 5r < 

1 1 ^ — y 1 1 1 ^ ~~ 

1 < \\<p{x)-<p{y)\\i < v 



1 + e L r (||x-y||i) 
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(c). Boundedness: \\ip(x)\\i < r for all x £ S. 



Proof Sketch. We would like to utilize the framework designed for I2 in Section 3.1. However, a 
few problems arise. Let us point them out explain how to solve them. 

• Step 7: This step is not possible for t\ norm, since there is no £i-analogue of the Kirszbraun 
theorem. Instead, we modify the entire construction (specifically, steps 2-6) so that they work 
with the entire data set S, not only with the net N. The effect of this will be seen shortly. 
(In £2, the same approach of discarding step 7 can be achieved by applying the Kirszbraun 
Theorem separately in every cluster in step 4, but this approach does not seem to have any 
advantages.) 

• Step 2: We apply a padded decomposition to the entire set S (and not only to the net N) 
with essentially same parameters and bounds. Thus, from now on each cluster C G Pj is a 
subset of S (but not necessarily of N). 

• Step 3: Instead of the Gaussian transform, we apply the Laplace transform L r , i.e. gc now 
satisfies Hpc^c) — 9c(y)\\i = A-GI^ — y\\) for all x,y G C. Such an embedding gc ■ C — > h is 
known to exist, see [ pL97| , Corollary 9.1.3]. The effect is clearly quite similar to that of the 
Gaussian transform. The fact that C is not a subset of N is not an issue. 

• Step 4: We need to find a weak analogue to the JL lemma, but there is an additional 
complication of having to deal with points not in the net N. Specificially, we need a map 
^ : gc{C) — > i\ which satisfies: (i) ^ is 1-Lipschitz on the entire cluster gc(C); and (ii) 
Psi achieves 1 + e distortion on the cluster net points gcip n N). Observe that the former 
requirement is non-standard and does not follow from 'standard" dimension reduction theo- 
rems for a finite subsets of t\. However, we describe below one simple dimension reduction 
for i\ which does extend to our setting. (This construction was observed jointly with Gideon 
Schechtman.) We suspect that the dimension can be further reduced, since the current con- 
struction is an isometry on gc(C n N), and does not exploit the 1 + e distortion allowed by 
requirement (ii). However, an improved map \& cannot be linear, since in the worst case such 
a linear map requires dimension k = 2 n U CnN ^ [|FJS91| , Corollary 12.A]. 



Construct ^ as follows. Since the metric gc(C) 6 l\, it can be written as a conic combination 
of cut metrics, i.e. there are 7^ > for A C C such that 

\\gc{x) - gc(y)\\i = ^ia\Ia{x) - U(y)|, Vx,y e C, 

A 

where 1a(x) = 1 if x £ A and otherwise. In other words, gc( x ) a A^A{x) is an 

isometric embedding of gc{C) into l\. Let \f have one coordinate for every subset B C CC\N; 
this coordinate is given by gc{x) — > Xm- An(CnN)=B 1a^a{x). In words, we add together 
coordinates whenever they correspond to different A but have the same Af\ (Cn N). Observe 
that ^> is 1-Lipschitz for all gc(x), x £ C, simply because adding two coordinates together 
can only decrease distances, and that it is isometric for gc(x), x G C n N, because now if 
coordinates corresponding to A and A 1 are added together then necessarily l^(^) = 1a'(x) 
and similarly l A (y) = l A >(y). Observe that k' = 2l crw l where |Cn JV| < \°( A / £Sr ). 

Step 5: There is only a minor change; since we do not restrict attention to net points, we 
now define hc{x) = min^gs^ ||sc — y\\i. 

Step 6: There is only a minor change to the scaling factor, namely ip = m~ l 
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The rest of the proof is quite similar to the one presented for £2, and the final dimension obtained 
is mhf = O^" 1 log A log log A) . e xp(A°( los ( e ~ 1<5 ~ 2 logA ») = ew {\0(\og(i/eS)+\o g iog\)^ n 

4.2 Single Scale Embedding for 



We can also extend Theorem |3.1| to spaces as follows. For r > define T r : R — > R, called the 
threshold transform, by T r (s) = min{s,r}. 

Theorem 4.3. For every scale r > and every < 5, e < 1/4, every finite set S C £00 admits an 
embedding ip : S ^ £^ for k = \0(iog(i/eS)+iog\og\) _ saUs f ying: 

(a) . Lipschitz: \\ip{x) — ip{y)\\ O0 <\\x — yW^ for all x,y £ S . 

(b) . 1 + e distortion to the threshold transform (at scales near r): For all x,y £ S with 5r < 



V\\oo < jg, 



1 + e 

(c). Boundedness: \[f{x)^oo — r f or a ^ x £ S. 



x) -<\\<p(z)-<p{y)\\oo<T r (\\z- y \ 



Proof Sketch. The proof is quite similar to that of Theorem 4.2, except for a few changes in some 
of the arguments. 

• Step 3: The thresholding of distances is easily achieved by a simple variant of the well-known 
Frechet embedding. Formally, gc has \C\ coordinates, one for every point z £ C, and that 
coordinate is given by x 1 — y min-[||^ — ^||oo>^}* If is easily verified that ||(/c(x) — ^c(£/)||oo — 
T r (\\x - y\\oo) for all x,y £ C. 

• Step 4: The required bound is again obtained by a simple variant of the well-known Frechet 
embedding. Formally, VI/ : gc{C) — > has one coordinate for every point z £ gc{C C\ N), 
and that coordinate is given by t 1 — >• ||t — ^||oo- Thus, k' = \C D N\. It is easily verified that 
this map ^> is 1-Lipschitz on the entire cluster gc(C) and also isometric on gc(C n N). 

• Step 6: The scaling factor is different and now 93 = ©2=i V 9 *- The resulting embedding 
is 1-Lipschitz: We consider the worst partition Pj (without averaging the partitions). Case 3 
in the Lipschitz analysis for £2 yields a Lipschitz constant of 1 + 5, and jj^g < 1- 

We remark that it suffices to use a padded decomposition with padding probability 1/2 (instead 
of 1 — e), but asymptotically this change does not improve the dimension. The final dimension 
obtained is mk! = 0{e~ l log A log log A) • (A°( log ( e ~ 1<5 ~ 2 lo s A ))) = \0{\og{i/eS)+\og\ogX) ^ 

The lower bound on ||<^(x) — <^(y)||oo f° r pair x,y follows when x is padded (case 1' in the 
distortion to the Gaussian analysis for £2), where we have \\ip(x) — p(y)\\oc = T r (\\x — y||oo); we 
further stipulate without loss of generality that 5 < so that the scaling factor is l + 2v£ < 1 + e. 

For the upper bound we have (from cases 2' and 3') \\(p(x) — ip(y)\\ oc < max{25||x — y\\oc, S\\x — 
y\\oo + T'rdl^ — y||oo)} < 25\\x — y\\oo + r r (||x — y||oo)- We consider two possibilities: 

1. 5r < \\x - y\\oo < r. Then T r (\\x - y^) = \\x - y^ and 25\\x - y\\oo + r r (||x - y\\oo) = 
{l + 25)T r (\\x-y\\ 00 ). 



2. r < \\x-y\loo < Then T r (\\x - yWoo) =r and 2S\\x - y\\oo + T r (\\x - y||oo) < (l + 2y/S) 



r 



(l + 2v / 5)T r (||x-y|| 00 j. 

The final result follows from the scaling factor in Step 6. □ 
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5 Algorithmic applications 



Here we illustrate the effectiveness and potential of our results for various algorithmic tasks by 
describing two immediate (theoretical) applications. 

Distance Labeling Scheme (DLS). Consider this problem for the family of n-point £2 metrics 
with a given bound on the doubling dimension. As usual, we assume the interpoint distances are in 
the range [1, -R]. Our snowflake embedding into l\ (Theorem 1.2 for a = ^) immediately provides 



a DLS with approximation (1 + e) 2 < 1 + 3e, simply by rounding each coordinate to a multiple of 
e/2k. We have: 

Lemma 5.1. Every finite subset £2 with A = X(S) possesses a (1 + e)- approximate distance labeling 
scheme with label size 

k ■ log Jfa = 0(e~ 4 (dim S) 2 ) log R. 

Notice that, apart from the logi? term, this bound is independent of n. The published bounds 
of this form (see JHM06| ] and references therein) apply to the the more general family of all doubling 
metrics (not necessarily Euclidean) but require exponentially larger label size, roughly (l/£)°( dimS \ 

Approximation algorithms for clustering. Clustering problems are often defined as an opti- 
mization problem whose objective function is expressed in terms of distances between data points. 
For example, in the fc-center problem one is given a metric (S, d) and is asked to identify a subset 
of centers C C S that minimizes the objective max xg 5 d(x, C). When the data set S is Euclidean 



(and the centers are discrete, i.e. from S), one can apply our snowflake embedding (Theorem 1.2 ) 
and solve the problem in the target space, which has low dimension k. Indeed, it is easy to see how 
to map solutions from the original space to the target space and vice versa, with a loss of at most 
a (1 + e) 2 < 1 + 3e factor in the objective. 

For other clustering problems, like £;-median or min-sum clustering, the objective function is 
the sum of certain distances. The argument above applies, except that now in the target space 
we need an algorithm that solves the problem with ^-squared costs. For instance, to solve the 
A:-median problem in the original space, we can use an algorithm for /c-means in the target space. 
Schulman [ |5ch00| ] has designed algorithms for min-sum clustering under both £2 and -^-squared 
costs, and their run time depend exponentially on the dimension. The following lemma follows 



from our snowflake embedding and [3chOC, Propositions 14,28]. For simplicity, we will assume that 
k = O(l). 

Lemma 5.2. Given a set ofn points S £ K d , a (1 +e) -approximation to the £2 min-sum k-clustering 
for S, for k = 0(1), can be computed 

1. in deterministic time n (d')2 2<0(ti H . 

2. in randomized time n '°( d ')2 2 ^°^ d ; n' = 0{e~ 2 log(<5 _1 n)) ; with probability 1 — 5. 
where d' = min{d, 0{e~ A dim 2 S)}. 
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